Restaurants from all over the world can be found here in Bengaluru. From United States to Japan, Russia to Antarctica, you get all type of cuisines here. Delivery, Dine-out, Pubs, Bars, Drinks,Buffet, Desserts you name it and Bengaluru has it. Bengaluru is best place for foodies. The number of restaurant are increasing day by day. Currently which stands at approximately 12,000 restaurants. With such an high number of restaurants. This industry hasn't been saturated yet. And new restaurants are opening every day. However it has become difficult for them to compete with already established restaurants. The key issues that continue to pose a challenge to them include high real estate costs, rising food costs, shortage of quality manpower, fragmented supply chain and over-licensing. This Zomato data aims at analysing demography of the location. Most importantly it will help new restaurants in deciding their theme, menus, cuisine, cost etc for a particular location. It also aims at finding similarity between neighborhoods of Bengaluru on the basis of food. The dataset also contains reviews for each of the restaurant which will help in finding overall rating for the place.

location = [x for x in zomato['location'].unique().tolist() if type(x) == str] latitude = [] longitude = [] for i in range(0, len(location)): if(type(location[i]) == str): ctr=0 while True: try: address = location[i] + ', Bengaluru, India' geolocator = Nominatim(user_agent="ny_explorer") loc = geolocator.geocode(address) latitude.append(loc.latitude) longitude.append(loc.longitude) print('The geograpical coordinate of location are {}, {}.'.format(loc.latitude, loc.longitude)) except: ctr+=1 if(ctr==7): print(i) latitude.append(address) longitude.append(address) break continue break

Is Online delivering available

From the above donut chart we observed that 58.9% were those who accepting Online delivery

Is Table booking available?

From the above donut chart we observed that 12.5% were those who not accepting Online Table Booking

From the above donut chart we can see that North Indian and North Indian, Chinese were the most popular cuisines of Bangalore

From the above boxplots we can see that false boxplot contains an outlier, this must be because of noise data (present there)

From the above bar graphs we saw that Banashankari is the location which had highest number of restaurants present

Loaction wise ratings of Restaurants :

From the above graph we observed that 3.9 rating has the highest tower in terms of location wise rating

Number of Restaurants based on Services:

From the above we observed that Delivery has the highest bar tower and Dine-out also, in terms of Type of Services

From the above graph we can see that 300-399 is the highest bar graph in terms of range of cost

From the above we saw that Cafe Coffee Day and Onesta has high bar towers

Handling missing values:

Replacing NAs with mean values

Droping those columns/features before Building Model

Building Model

from sklearn.ensemble import RandomForestRegressor RForest=RandomForestRegressor(n_estimators=500,random_state=None,min_samples_leaf=2,max_depth=6) RForest.fit(x_train,y_train) y_predict=RForest.predict(x_test) from sklearn.metrics import r2_score r2_score(y_test,y_predict)

So from above we came to know that Decision Tree Regressor has 98.7% accuracy which means our model is perfectly working

from sklearn.ensemble import RandomForestClassifier from sklearn import preprocessingfrom sklearn.linear_model import LinearRegression reg=LinearRegression() reg.fit(x_train,y_train) y_pred=reg.predict(x_test) from sklearn.metrics import r2_score r2_score(y_test,y_pred)
from sklearn.ensemble import RandomForestRegressor RForest=RandomForestRegressor(n_estimators=100,random_state=None,min_samples_leaf=2) RForest.fit(x_train,y_train) y_predict=RForest.predict(x_test) from sklearn.metrics import r2_score r2_score(y_test,y_predict)
models={ 'LR':LinearRegression(), 'KNN':KNeighborsRegressor(), 'DT':DecisionTreeRegressor(), 'RF':RandomForestRegressor(), }model1=DecisionTreeRegressor() model1.fit(x,y)import sklearn.metrics as sm for name,model in models.items(): print(f'using:{name}') model.fit(x_train,y_train) y_test_pred=model.predict(x_test) print("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2)) print("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2)) print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2)) print("Explain variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2)) print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2)) print('_'*30)
from sklearn.ensemble import RandomForestRegressor from sklearn.tree import export_graphviz from io import StringIO from IPython.display import Image import pydotplus # create a random forest regressor est = RandomForestRegressor(n_estimators=5) est.fit(x_train, y_train) # get the feature names feature_cols = x.columns # create dot data for each tree in the forest dot_data = StringIO() for i, tree in enumerate(est.estimators_): export_graphviz(tree, out_file=dot_data, filled=True, rounded=True, special_characters=True, feature_names=feature_cols) dot_data.write('\n') # create a graph from the dot data dot_data = dot_data.getvalue().replace('&', '&') graph = pydotplus.graph_from_dot_data(dot_data) # write the graph to a PNG file and display it graph.write_png('random_forest.png') Image(graph.create_png())
from sklearn.svm import SVR from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error # Split the data into training and testing sets x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42) # Create an SVR object model = SVR(kernel='linear') # Train the model using the training sets model.fit(x_train, y_train) # Make predictions on the testing set y_pred = model.predict(x_test) # Calculate the root mean squared error rmse = mean_squared_error(y_test, y_pred, squared=False) print(f"RMSE: {rmse:.2f}")